Supervised Semantic Indexing for Ranking Documents

نویسندگان

  • Bing Bai
  • Jason Weston
  • Ronan Collobert
  • David Grangier
چکیده

Ranking text documents given a query is one of the key tasks in information retrieval. Typical solutions include classical vector space models using weighted word counts and the cosine similarity (TFIDF) with no machine learning at all, or Latent Semantic Indexing (LSI) using unsupervised learning to learn a low dimensional space of “latent concepts” via a reconstruction objective. The former assumes independence of words and cannot capture synonymy or polysemy, whilst the latter is still agnostic to the actual task of interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Role of semantic indexing for text classification

The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...

متن کامل

Enriching Ontologies with Encyclopedic Background Knowledge for Document Indexing

The rapidly increasing number of scientific documents available publicly on the Internet creates the challenge of efficiently organizing and indexing these documents. Due to the time consuming and tedious nature of manual classification and indexing, there is a need for better methods to automate this process. This thesis proposes an approach which leverages encyclopedic background knowledge fo...

متن کامل

Efficient semantic indexing via neural networks with dynamic supervised feedback

We describe a portable system for e cient semantic indexing of documents via neural networks with dynamic supervised feedback. We initially represent each document as a modified TF-IDF sparse vector and then apply a learned mapping to a compact embedding space. This mapping is produced by a shallow neural network which learns a latent representation for the textual graph linking words to nearby...

متن کامل

Supervised Locality Preserving Indexing for Text Categorization

A major characteristic of text categorization problems is the prohibitive high dimensionality of the feature space. Most discrimination methods can not work in such a condition, Latent Semantic Indexing (LSI) has been adopted to solve this problem. However, LSI is not an optimal representation for text categorization task mainly because of two reasons: first, the discriminative categorical info...

متن کامل

An Enhanced Indexing And Ranking Technique On The Semantic Web

With the fast growth of the Internet, more and more information is available on the Web. The Semantic Web has many features which cannot be handled by using the traditional search engines. It extracts metadata for each discovered Web documents in RDF or OWL formats, and computes relations between documents. We proposed a hybrid indexing and ranking technique for the Semantic Web which finds rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009